AITopics

Genre: Research Report > Experimental Study (1.00)

Industry: Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Data Science > Data Mining > Big Data (0.66)

Neural Information Processing SystemsApr-24-2026, 10:31:06 GMT

024677efb8e4aee2eaeef17b54695bbe-Supplemental.pdf

artificial intelligence, machine learning, mujoco environment, (18 more...)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsApr-24-2026, 10:31:02 GMT

024677efb8e4aee2eaeef17b54695bbe-Paper.pdf

machine learning, reinforcement learning, safe region, (17 more...)

Country: Asia (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.69)

Neural Information Processing SystemsFeb-7-2026, 06:52:57 GMT

TimeDiscretization-Invariant SafeActionRepetitionforPolicyGradientMethods

In reinforcement learning, continuous time is often discretized by a time scale δ, to which the resulting performance is known to be highly sensitive. In this work, we seek tofind aδ-invariantalgorithm for policygradient (PG) methods, which performs well regardless of the value ofδ. We first identify the underlying reasons that cause PG methods to fail asδ 0, proving that the variance of the PG estimator can diverge to infinity in stochastic environments under a certain assumption of stochasticity. While durative actions or action repetition can be employed to haveδ-invariance, previous action repetition methods cannot immediately react to unexpected situations in stochastic environments. We thus propose a novelδ-invariant method namedSafe Action Repetition (SAR) applicable to any existing PG algorithm. SAR can handle the stochasticity of environments byadaptivelyreacting tochanges instates during action repetition.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Country:

Asia > Middle East > Jordan (0.04)
Europe > France (0.04)
Asia > Vietnam > Long An Province (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Yazdani, Pouria, Rezaali, Arash, Abdoos, Monireh

Semi Centralized Training Decentralized Execution Architecture for Multi Agent Deep Reinforcement Learning in Traffic Signal Control

arXiv.org Artificial IntelligenceDec-5-2025

Traffic congestion is a major and complex challenge for cities worldwide with the rapid growth of urbanization and vehicle ownership. Longer commute times, excessive fuel consumption, and elevated air pollution levels are direct consequences of over-saturated roads. For instance, according to the 2024 INRIX Global Traffic Scorecard, individual commuters in Istanbul, New York City, and Chicago experienced total annual delay of about 105, 102, and 102 hours, respectively, underscoring the magnitude of intersection-driven delays in major metros (INRIX). Within urban networks, signalized intersections are the dominant bottlenecks: the policies implemented at these intersections allocate scarce space-time among competing traffic streams and therefore largely determine corridor-level delay, queues, and emissions. Reinforcement learning (RL) has become a standard practice for adaptive traffic signal control (ATSC), controlling phase selection and timing as a sequential decision problem that optimizes long-horizon objectives such as delay, throughput, and emissions under nonstationary demand (Yau et al., 2017). Deep RL (DRL) extends this by using function approximation to digest rich state representations--from detector queues to trajectories and graph-structured networks--enabling policies that generalize across varying traffic flows and topologies (Zhao et al., 2024). Collectively, this body of work motivates moving beyond single-intersection controllers toward coordinated, network-level solutions and setting the stage for multi-agent formulations.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

2512.04653

Country:

North America > United States > New York (0.24)
North America > United States > Illinois > Cook County > Chicago (0.24)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.24)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.24)

Genre: Research Report (0.81)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.34)

Chatterjee, Palash, Khardon, Roni

Improving planning and MBRL with temporally-extended actions

arXiv.org Artificial IntelligenceOct-23-2025

Continuous time systems are often modeled using discrete time dynamics but this requires a small simulation step to maintain accuracy. In turn, this requires a large planning horizon which leads to computationally demanding planning problems and reduced performance. Previous work in model-free reinforcement learning has partially addressed this issue using action repeats where a policy is learned to determine a discrete action duration. Instead we propose to control the continuous decision timescale directly by using temporally-extended actions and letting the planner treat the duration of the action as an additional optimization variable along with the standard action variables. This additional structure has multiple advantages. It speeds up simulation time of trajectories and, importantly, it allows for deep horizon search in terms of primitive actions while using a shallow search depth in the planner. In addition, in the model-based reinforcement learning (MBRL) setting, it reduces compounding errors from model learning and improves training time for models. We show that this idea is effective and that the range for action durations can be automatically selected using a multi-armed bandit formulation and integrated into the MBRL framework. An extensive experimental evaluation both in planning and in MBRL, shows that our approach yields faster planning, better solutions, and that it enables solutions to problems that are not solved in the standard formulation.

data mining, machine learning, reinforcement learning, (21 more...)

2505.15754

Genre: Research Report > Experimental Study (1.00)

Industry: Education > Educational Setting > Online (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Data Science > Data Mining > Big Data (0.66)

arXiv.org Artificial IntelligenceSep-19-2025

ClearFairy: Capturing Creative Workflows through Decision Structuring, In-Situ Questioning, and Rationale Inference

Son, Kihoon, Choi, DaEun, Kim, Tae Soo, Kim, Young-Ho, Yun, Sangdoo, Kim, Juho

Capturing professionals' decision-making in creative workflows is essential for reflection, collaboration, and knowledge sharing, yet existing methods often leave rationales incomplete and implicit decisions hidden. To address this, we present CLEAR framework that structures reasoning into cognitive decision steps-linked units of actions, artifacts, and self-explanations that make decisions traceable. Building on this framework, we introduce ClearFairy, a think-aloud AI assistant for UI design that detects weak explanations, asks lightweight clarifying questions, and infers missing rationales to ease the knowledge-sharing burden. In a study with twelve creative professionals, 85% of ClearFairy's inferred rationales were accepted, increasing strong explanations from 14% to over 83% of decision steps without adding cognitive demand. The captured steps also enhanced generative AI agents in Figma, yielding next-action predictions better aligned with professionals and producing more coherent design outcomes. For future research on human knowledge-grounded creative AI agents, we release a dataset of captured 417 decision steps.

knowledge management, large language model, machine learning, (24 more...)

2509.14537

Country:

Europe (1.00)
North America > Canada (0.67)
North America > United States > California (0.67)
North America > United States > New York > New York County > New York City (0.15)

Genre:

Workflow (1.00)
Research Report > Experimental Study (0.68)
Research Report > New Finding (0.68)
Personal > Interview (0.45)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.45)

Technology:

Information Technology > Knowledge Management (1.00)
Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Communications (1.00)
(6 more...)

Middelhuis, Jeroen, Bukhsh, Zaharah, Adan, Ivo, Dijkman, Remco

A Rollout-Based Algorithm and Reward Function for Resource Allocation in Business Processes

arXiv.org Artificial IntelligenceSep-3-2025

Resource allocation plays a critical role in minimizing cycle time and improving the efficiency of business processes. Recently, Deep Reinforcement Learning (DRL) has emerged as a powerful technique to optimize resource allocation policies in business processes. In the DRL framework, an agent learns a policy through interaction with the environment, guided solely by reward signals that indicate the quality of its decisions. However, existing algorithms are not suitable for dynamic environments such as business processes. Furthermore, existing DRL-based methods rely on engineered reward functions that approximate the desired objective, but a misalignment between reward and objective can lead to undesired decisions or suboptimal policies. To address these issues, we propose a rollout-based DRL algorithm and a reward function to optimize the objective directly. Our algorithm iteratively improves the policy by evaluating execution trajectories following different actions. Our reward function directly decomposes the objective function of minimizing the cycle time, such that trial-and-error reward engineering becomes unnecessary. We evaluated our method in six scenarios, for which the optimal policy can be computed, and on a set of increasingly complex, realistically sized process models. The results show that our algorithm can learn the optimal policy for the scenarios and outperform or match the best heuristics on the realistically sized business processes.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2504.1125

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

arXiv.org Artificial IntelligenceAug-11-2025

HALO: Hindsight-Augmented Learning for Online Auto-Bidding

Dong, Pusen, Cao, Chenglong, Zhou, Xinyu, You, Jirong, Xu, Linhe, Xu, Feifan, Yuan, Shuo

Digital advertising platforms operate millisecond-level auctions through Real-Time Bidding (RTB) systems, where advertisers compete for ad impressions through algorithmic bids. This dynamic mechanism enables precise audience targeting but introduces profound operational complexity due to advertiser heterogeneity: budgets and ROI targets span orders of magnitude across advertisers, from individual merchants to multinational brands. This diversity creates a demanding adaptation landscape for Multi-Constraint Bidding (MCB). Traditional auto-bidding solutions fail in this environment due to two critical flaws: 1) severe sample inefficiency, where failed explorations under specific constraints yield no transferable knowledge for new budget-ROI combinations, and 2) limited generalization under constraint shifts, as they ignore physical relationships between constraints and bidding coefficients. To address this, we propose HALO: Hindsight-Augmented Learning for Online Auto-Bidding. HALO introduces a theoretically grounded hindsight mechanism that re-purposes all explorations into training data for arbitrary constraint configuration via trajectory reorientation. Further, it employs B-spline functional representation, enabling continuous, derivative-aware bid mapping across constraint spaces. HALO ensures robust adaptation even when budget/ROI requirements differ drastically from training scenarios. Industrial dataset evaluations demonstrate the superiority of HALO in handling multi-scale constraints, reducing constraint violations while improving GMV .

constraint, machine learning, reinforcement learning, (20 more...)

2508.03267

Genre: Research Report (0.82)

Industry: Marketing (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)

arXiv.org Artificial IntelligenceFeb-27-2025

A Generative Model Enhanced Multi-Agent Reinforcement Learning Method for Electric Vehicle Charging Navigation

Qi, Tianyang, Chen, Shibo, Zhang, Jun

With the widespread adoption of electric vehicles (EVs), navigating for EV drivers to select a cost-effective charging station has become an important yet challenging issue due to dynamic traffic conditions, fluctuating electricity prices, and potential competition from other EVs. The state-of-the-art deep reinforcement learning (DRL) algorithms for solving this task still require global information about all EVs at the execution stage, which not only increases communication costs but also raises privacy issues among EV drivers. To overcome these drawbacks, we introduce a novel generative model-enhanced multi-agent DRL algorithm that utilizes only the EV's local information while achieving performance comparable to these state-of-the-art algorithms. Specifically, the policy network is implemented on the EV side, and a Conditional Variational Autoencoder-Long Short Term Memory (CVAE-LSTM)-based recommendation model is developed to provide recommendation information. Furthermore, a novel future charging competition encoder is designed to effectively compress global information, enhancing training performance. The multi-gradient descent algorithm (MGDA) is also utilized to adaptively balance the weight between the two parts of the training objective, resulting in a more stable training process. Simulations are conducted based on a practical area in Xi\'an, China. Experimental results show that our proposed algorithm, which relies on local information, outperforms existing local information-based methods and achieves less than 8\% performance loss compared to global information-based methods.

algorithm, global state, information, (12 more...)